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Abstract. In an inverse problem, such as the determination of brain activity given 
magnetic field measurements outside the head, the main quantity of interest is often 
the power associated with a source. The 'standard' way to determine this has been 
to find the best linear estimate of the source and calculate the power associated with 
this. This paper proposes an alternative method and then relationship to this previous 
method of estimation is explored both algebraically and by numerical simulation. 

In abstract terms the problem can be stated as follows. Let H he a, Hilbert space 
with inner product ( , ). Let L be a linear map: if ^ M". Suppose that we are given 
data b € M" such that b = Lx + e where e is a vector of random variables with zero 
mean and given covariance matrix which represents measurement errors. The problem 
that is addressed in this paper is to estimate (x, Xx) where X is an operator on H 
(e.g. the characteristic function of a region of interest). 
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1. Introduction 

This paper solves a problem that arose in the study of the inverse problem in 
magnetoencephalography (MEG) 0. The dominant concern in MEG analysis has 
been to produce source maps of current density in the brain and to co-register these to 
anatomical data (e.g. |]l], H). However, this may not be the most appropriate approach 
when there is a focus on specific source regions in the brain, e.g. the thalamus, fusiform 
gyrus etc. In these cases it may be more appropriate to generate an activation curve, 
a graph of the power dissipated in a specified region as a function of time. Several 
methods of generating activation curves have been proposed (e.g. @, il P). This aim 
of this paper is to derive an algorithm for generating activation curves that is optimal 
with respect to the L2-norm. 

Another argument for the use of activation curves is the direct comparison with 
other functional brain imaging modalities such as positron emission tomography (PET) 
and functional magnetic resonance imaging (fMRI). These modalities produce images 
of quantities, e.g. regional cerebral blood flow (rCBF), that are correlated with power 
dissipated rather than current density. This suggests that in order to compare results 
across modalities we should use magnetic field data to produce an estimate of the power 
dissipated, i.e. an activation curve. 

In Section 2 a more general problem is solved in the setting of a linear map from a 
Hilbert space to a finite dimensional Hilbert space. The main result from Section 2 (i.e. 



Equation |T6|) can be applied independently to each time instant of the data from a MEG 
experiment. The method proposed is to find a matrix Y such that b^Yb approximates 
{x,Xx) denotes matrix transposition). The derivation of the optimal matrix Y 
(Equation |16D with respect to the L2-norm is contained in Section 2. Section 3 goes on 
to compare the main results of Section 2 with the naive algorithm which first computes 
an estimate, xreg, using Tikhonov regularization and then computes (xreg, -^a^reg)- 
This algorithm was used in 0] to extract measures of brain activity. 

In Section 4 we specialize to the study of the MEG inverse problem. Definitions 
appropriate to this application are introduced and a simulation study is described. In 
Section 5 an important special case is considered where the region of interest is the 
whole brain. A simplified equation (Equation ^2]) for this case is derived and this is 
compared with the total signal power which is commonly used as an estimate of brain 
activity. Section 6 is a discussion of the merits of the algorithm together with the issues 
to be addressed before applying the method in practice. 

2. Methods 

Let if be a Hilbert space with inner product ( , ). Let L be a linear map: H —>■ M". 
Suppose that we are given data 6 G such that 

b = Lx + e (1) 
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where e is an unknown vector of random variables with zero mean and covariance matrix 
C which represents measurement error. Suppose that the problem of finding an x E H 
corresponding to a 6 G is an ill-posed problem. The problem here is to estimate 
{x, Xx) where X is an operator on H. 

It should be noted that no assumptions are made about the noise in the 
measurement channels other than it has zero mean and a well defined covariance matrix 
C, i.e. if the measurement noise is denoted by a vector e then the covariance matrix is 
defined by Cij = e^ej where ~ denotes an expectation value. 

Now define the adjoint map by 

{x, L^b) = {Lxfb, for a\lxeH,be W. (2) 

Here we are concerned with the image space X of L^. Let {et : i = 1 . . .n} he the usual 
basis of and choose a corresponding basis of X, {ipi : i = 1 . . .n}, where ipi = L'^ei. 

The matrix Y will be chosen to minimize the error for points in X. The starting point 
in choosing an optimal matrix Y is to derive a suitable cost function to be minimized. 
We start by expanding b^Yb. 

b^Yb = {Lx + efY{Lx + e) = {LxfYLx + e^YLx + {LxfYe + e^ye(3) 

As mentioned above we focus on points in X C if , so we express a; G X in terms of our 
basis: x = Y17=i where Oj G M are scalars which will be written collectively as a 
vector a. Equation ^ can be simplified because the expression Lx appears repeatedly, 
so start by simplifying this expression: 

n n 

{Lx)'^ej = {x,Vej) = {(^^ai'^p^^ ,ipj) = ^ai{^|J^,'^pj) . (4) 

i=l i=l 

The right hand side of Equation ^ can be written as the jth component of a product 
Pa where Pij = {ipi^ipj). Note that P is a symmetric positive definite n x n matrix. 
Substituting for Lx in Equation |] gives: 

b'^Yb = a^PYPa + e^YPa + a^PYe + e^Ye. (5) 

The projection of the operator X onto X has a matrix representation with respect to the 
basis {ipi} defined by Xij = {ipi, Xipj) where i, j = 1, . . . ,n. Hence the target expression 
can be written in terms of the vector a: 

n 

{x,Xx) = a'^Xa, where x = ^^a^'j/'j. (6) 

i=l 

For F to be a good estimator, the right hand sides of Equations |] and |] should be 'close' 
for all a G M". One way of achieving this is to minimize the cost function E defined by: 

E=\\X - PYP\\l + \\e^YP\\l + \\PYe\\l + \\e^Ye\\l (7) 

where || II2 is the L2-norm. Equation ^ can be interpreted in physical terms. The first 
term is the error in approximating the operator X by Y. The second and third terms 
give a measure of the overlap, induced by Y, between the measurement error and the 
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imaging space, X. Note that these terms are equal for a symmetric Y. The fourth term 
is a measure of how Y magnifies the measurement error. 

To minimize E, OE/dYik is derived for each element of the matrix Y . This gives 
N"^ equations to solve for the iV^ unknowns 1^^. These may be written as a single 
matrix equation. In order to illustrate the manipulations involved, the method will be 
elaborated for the fourth term in Equation |^. The fourth term is expanded using the 
definition of the L2-norm: 

\\e^y4l = ($^e„y,^e^)'. (8) 

a,l3 

This is differentiated to obtain: 
d\\e^Ye\\l 



dYa 



We proceed by replacing the products of random variables with their expectation values, 
i.e. e~e^ = and e^ = C^k- 

^ o-,^ ^ = '^'y^^CicXafiCpk- (10) 

This is the zfcth term of the matrix product CYC. Similarly, all of the other terms in 
Equation |^, when differentiated, give terms that can be written as the ikih elements of 
a product. So, the equations can be collected as: 

- 2PXP + 2p2yp2 ^ 2P'^YC + 2CYP'^ + 2CYC = 0. (11) 

This may be written in the form: 

(p2 + C)r(p2 + C) = PXP. (12) 

This equation can be solved in many ways, for example by defining Z = Y{P^ + C) and 
solving for Z first and then for Y. This easily implemented procedure was rejected as 
it computes an non-symmetric Y when starting with a symmetric matrix X, because 
of the numerical problems associated with ill-conditioned matrices. So an alternative 
scheme which preserves symmetry was devised. Let Aj be the eigenvalue of the matrix 
P with eigenvector 0j. Then the matrices X and C can be represented with respect to 
the basis {(pi} as new matrices X'and C, i.e. 

X=Y1 '^^^'^kfk. where X',, = </)f X</)„ (13) 

ik 

C=Y1 <^iC'^kfk. where C\u = ^JC<P,. (14) 

ik 

With these definitions, the matrix Y can be finally expressed as: 

Y={P' + Cr'p[Yl ^^X'^k^t)nP' + C)-' (15) 
ik 

= J2 KK<P^iC'+ \^i)-'xr,iC'+ Xliy'cfl (16) 



ik 
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The matrix Y computed using the above formula is always symmetric for a given 
input symmetric matrix X. 

Frequently the covariance matrix C is not known and the assumption is made that 
the random variables Cj are independent Gaussian random variables with a variance ( 
that is considered to be a parameter of the method. With this assumption C = (I and 
Equation R becomes: 

Y = y J' ^ J' A ^^k^l (17) 

3. Comparison with naiVe method 



We now compare Equation [T^ with the corresponding equation derived by the naive 
method mentioned in the introduction. The naive method for computing (x, Xx) is to 
compute a minimum norm estimate using Tikhonov regularization to get Xreg and then 
compute the inner product. 

To compute a xreg the first step is to choose a finite dimensional subspace i? C if 
that has an orthonormal basis {va : a = 1, . . .m}. The subspace R will be called the 
representation space and the regularized solution xreg will lie in this space. The linear 
map L : H —>■ M" defines a linear map from R to M" by restriction that we will also call 
L. 

Now compute a singular value decomposition of L : i? ^ M" as L = UT^V'^, where 
S is a diagonal matrix with non- negative entries di, cr2, . . . o"n and U and V are matrices 
with orthonormal columns, i.e. U'^U = V'^V = I. Applying Tikhonov regularization 
to the inverse problem gives xreg = VDU^b, where D is a diagonal matrix given by 
D = (S^ + So the power dissipated by this source can be computed by: 

(xreg, Xxreg) = (b^UDV^) X [VDU^b) , (18) 

where X is the matrix representation of the operator X on i?, i.e. X^fi = {ra,Xrp). 
The right hand side of Equation 18 is of the form b^Yb where Y is defined to be: 

Y = UDV^XVDU^ (19) 

The comparison with the method in the previous section relies on the relationship 
between the linear operator L and the Gram-Schmidt matrix P that we will now derive. 
Suppose for a moment that the representation space R was the whole of H and that the 
basis {va} is a complete orthonormal basis for R = H. In this case: 

Pij = = ^(?/'i,r„)(r„, V'j), by completeness, (20) 

a 

= '^^{L^ei,ra){ra, L^Cj), using the definition of ?/'(,21) 

a 

= ej {Lra){Lra)^ej, using the definition of L(^2) 

a 

= ef ^ ^ r^rl^ L^ej , by linearity, (23) 
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= ejLL'^ei. (24) 

The right hand side of this equation is the ijth component of the matrix product LL^ . 
So under the assumption that {va} is a complete orthonormal basis for H then P = LL^ . 

Now returning to the case when R G H we can see that for a good choice of R the 
matrix P defined to be LL^ will be approximately equal to P. This is not surprising 
since to compute the Gram-Schmidt matrix P on a computer one usually takes a suitable 
representation space R and computes LL^ . The singular value decomposition of L 
immediately gives an eigenvalue decomposition of P since 

p = LL^ = uj:v^vj:u^ = ut?u^, (25) 

where the last equality follows from the fact that the columns of V are orthonormal. So 
the matrix P has eigenvalues af with eigenvectors, 0j given by the columns of U . 

By a similar argument to the above it can be seen that the matrix X' defined to 
be XV approximates the matrix X' so we have: 

Y = UDXDU^ = -^,^^4>^X[J>l, (26) 
ik + ^fc + ^ 

Now we can compare Equation ^ with Equation |T^. For a good representation space 
R we have 0i ~ 0j, X' ~ X' and so the major difference between the two approaches 
is that Aj ~ af. The effect of this change can be seen by plotting out the graphs of 
the functions on the interval [0, 1] (this is the only range of interest since we could 
dividing by the largest singular value restrict to this interval). These graphs are shown 
in Figure |l| where it can be seen that Equation ^ attenuates the contribution from the 
small singular values and has a sharper cut-off than is the case for Equation The 
effect of this is that Equation p!7| should attenuate the noise component, which is usually 
associated with the small singular values. 



1.0 




0.0 0.5 X 1.0 



Figure 1. Graphs of the functions xjix^ + C) (solid curve) and jix^ + C) (dashed 
curve) for C = 0.5. 
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4. Application 

Now we apply our results to the MEG inverse problem, i.e. the problem of recovering 
information about source current density inside the brain given measurements of the 
magnetic field outside the brain. Let Q denote the brain volume. The Hilbert space of 
interest to us is, L2{fl), the space of square integrable vector fields defined on the brain 
volume Q together with the inner product: 

(/i,j2)= /^i^^^T^df, for all ^2(1^). (27) 

The factor uj{r) is a weighting factor that allows some flexibility in the procedure. The 
only restriction imposed on uj{f) is that the integral over each voxel is finite. In other 
papers the factor uj{r) has been interpreted as a probability weight |^. 

It is interesting in this context to look at the the spatial selectivity implicit in the 
use of the matrix Y as it varies in source space. Then the sensitivity profile of F at a 
point in source space, r^, is defined to be 

3 

J(fo) = ^(l4)^F(l4), (28) 
1=1 

where d^^ is the current dipole distribution, i.e. dl^{r) = 5(r— ro)ej where {cj : i = 1, 2, 3} 
is an orthogonal set of unit vectors and 6{ ) denotes the Dirac delta function. 

The spatial selectivity, /(ro), may be thought of as an instrumental generalization 
of the lead field of a single measurement channel. The definition is designed so that in 
the case when Yik = 1 when i = k = no and otherwise then the sensitivity I{ro) is the 
square of the magnitude of the lead field of channel no. Note that the above definition 
of I{ro) is different from the original definition proposed in 

To illustrate the method a simple simulated experimental system (Figure |^) has 
been investigated. The head is modelled as a homogeneous conducting sphere of radius 
8.9cm with its centre at (0,0, —0.07cm). The source space is a 9cmx9cm square thin 
lamina consisting of 33x33 voxels in the plane z = —0.01 cm with centre (0, 0, —0.01 cm). 
The measurement instrument is a hexagonal array of 37 second order axial gradiometers 
with baseline 5 cm with the lowest 'sensing' coils in the plane z = 4 cm. Now consider, 
in the context of the simulated system, the simplest possible region of interest operator 
X = 6{r — Tc) where fc = (0, 0, —0.01 cm) is the centre of source space. This type 
of operator might be adopted if one simply wished to focus on a small volume of 
source space. The matrix Y used as an estimator from this operator is calculated 
using Equation |1^. The sensitivity profile for this Y matrix is shown in Figure ^ 

The reconstruction of an activation curve has been tested on simulated data using 
this region of interest operator and simulated data from a time varying target dipole 
at (0,0,0 cm), i.e. 1cm from the region of interest. The moment of the dipole varies 
sinusoidally at 10 Hz, with an envelope that rises linearly from zero at 200 ms to a 
maximum at 300 ms after which it remains constant. To show the insensitivity to dipole 
orientation the dipole moment was made to rotate smoothly in a tangential plane — 
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Figure 2. (left) A plan view of the experiment geometry. Crosses denote source 
space voxels and diamonds denote the projections of the centres of the detector coils, 
(right) The sensitivity profile in source space of the Y matrix that is derived from the 
operator X = (5(r — r^). 



this rotation is not discernible in the activation curve. In addition to the target dipole 
there is distractor dipole at (0, 0.02 cm, 0), which is active from to 100ms (triangular 
envelope) and again from 400ms (square envelope). 

In the period from 200 ms to 400 ms when only the target dipole is active, the 
calculated (power) activation curve matches closely that of the target. However, the 
existence of the distractor dipole within the sensitive region (see Figure ^ gives rise to 
apparent activity between ms and 100 ms and inaccuracy in the calculated activation 
curve for the period after 400 ms. The distractor dipole adds to the estimated power 
dissipated when it is parallel to the target and subtracts when the target dipole has 
rotated to be anti-parallel. 

Error bars for the activation curve can be estimated using the last term in 
Equation ^ to give the amount of measurement noise reflected in the activation curve. 
The estimate is given by ^ Ca/3^a/3- 

5. Total brain activity 



As a special case of Equation the task of finding an estimate of the total activity in 
the source space is considered. In this case the operator X is the identity and so 

X,, = {i/j,, Xi/jj) = ^j) = P,, (29) 
So the matrix X' can be calculated as follows 

X',, = 0f X0^. = 0f P0^. = A^.0f 0,. = (30) 
where 6ij is the Kronecker delta. So, in this case, Y is given by the simplified formula: 
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Figure 3. Activation curves for a simulated experiment. Tlie solid line and the dotted 
lines are the activation curves of the target and distractor dipoles. The diamonds are 
the calculated activation curve from the Y matrix whose sensitivity profile is shown in 
Figure ^j. The error bars, omitted for clarity, would be approximately twice the height 
of the diamonds. 



This gives the following formula for computing the total activity. 

Total activity, A(t) = ^ (^2^^ (^fK^))' (32) 

where b{t) is the vector of measurements collected at time t. 

Previously when an estimate of the total brain activity was needed the power in 
the signals was used, i.e. 

Total signal power, B{t) = h{tfh{t) (33) 

These two methods have been compared for the simulated data described above as shown 
in Figure In Figure ^ it can be seen that the estimate A{t) (shown as the solid line 
on the left) more closely approximates the true activation of the dipoles (dashed curve) 
than the estimate B{t). In fact, if the error in the estimate is measured by the integral 
of the squared discrepancies between the curves then the error for A[t) is 2.6 x 10~^ 
whilst the error for B{t) is 6.6 x 10~^. 



6. Discussion 



We have shown that it is possible to directly compute the 'power' associated with a 
source without computing the source first. The method seems robust to noise and is not 
dependent on the noise having a Gaussian profile. Correlations between measurement 
channels are fully taken into account. In particular it was shown that activation curves 
of brain regions can obtained from magnetic field data. The method provides an easily 
computable way of tracking the power dissipated in a specific region of the brain. 
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Figure 4. (left) A comparison of the total brain activity, A{t), (solid line) with a plot 
of the power of the dipolar sources that generated the simulated data (dashed line). In 
order to compare with the right-hand diagram both curves are normalized to enclose a 
unit area, (right) A comparison of the total signal power, B{t), (solid line) with a plot 
of the power of the dipolar sources that generated the simulated data (dashed line). 
In order to compare with the left-hand diagram both curves are normalized to enclose 
a unit area. 



To use the method effectively the practical problem is to effectively estimate the 
covariance matrix. For evoked response experiments the covariance matrix, C, can 
be estimated from the prestimulus period. For other experiments it might be more 
suitable to make the a priori assumption that the noise is uncorrelated Gaussian noise 
with variance a that could be considered as a parameter. As a increases, the more 
closely the Y matrix sensitivity pattern matches the region of interest, but the larger 
the error bars on the resulting activation curve. 

Finally, to answer the question in the title, I would say that if best is interpreted 
in a least L2-norm sense then the answer is no. The best way to estimate the power 
associated with a source is to compute it directly. 
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